Using J48 Tree Partitioning for scalable SVM in Spam Detection

نویسندگان

  • Mohammad-Hossein Nadimi-Shahraki
  • Zahra S. Torabi
  • Akbar Nabiollahi
چکیده

Support Vector Machines (SVM) is a state-of-the-art, powerful algorithm in machine learning which has strong regularization attributes. Regularization points to the model generalization to the new data. Therefore, SVM can be very efficient for spam detection. Although the experimental results represent that the performance of SVM is usually more than other algorithms, but its efficiency is decreased when the number of feature of spam is increased. In this paper, a scalable SVM is proposed by using J48 tree for spam detection. In the proposed method, dataset is firstly partitioned by using J48 tree, then, features selection are applied in each partition in parallel. Consistently, selected features are used in the training phase of SVM. The propose method is evaluated conducted some benchmark datasets and the results are compared with other algorithms such as SVM and GA-SVM. The experimental results show that the proposed method is scalable when the number of features are increased and has higher accuracy compared to SVM and GA-SVM.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of Spam Email by Combining Harmony Search Algorithm and Decision Tree

Spam emails is probable the main problem faced by most e-mail users. There are many features in spam email detection and some of these features have little effect on detection and cause skew detection and classification of spam email. Thus, Feature Selection (FS) is one of the key topics in spam email detection systems. With choosing the important and effective features in classification, its p...

متن کامل

A Comparative Study for Email Classification

Email has become one of the fastest and most economical forms of communication. However, the increase of email users have resulted in the dramatic increase of spam emails during the past few years. In this paper, email data was classified using four different classifiers (Neural Network, SVM classifier, Naïve Bayesian Classifier, and J48 classifier). The experiment was performed based on differ...

متن کامل

Sentiment Based Twitter Spam Detection

Spams are becoming a serious threat for the users of online social networks especially for the ones like of twitter. twitter’s structural features make it more volatile to spam attacks. In this paper, we propose a spam detection approach for twitter based on sentimental features. We perform our experiments on a data collection of 29K tweets with 1K tweets for 29 trending topics of 2012 on twitt...

متن کامل

A Comparative Study of Classification Algorithms for Spam Email Data Analysis

In recent years email has become one of the fastest and most economical means of communication. However increase of email users has resulted in the dramatic increase of spam emails during the past few years. Data mining -classification algorithms are used to categorize the email as spam or non-spam. In this paper, we conducted experiment in the WEKA environment by using four algorithms namely I...

متن کامل

Email Classification Using Machine Learning Algorithms

Email has become one of the frequently used forms of communication. Everyone has at least one email account. Inflow of spam messages is a major problem faced by email users. Currently there are many spam filtering techniques. As the spam filtering techniques came up, spammers improved their methods of spamming. Thus, an effective spam filtering technique is the timely requirement. In this paper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer and Information Science

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2015